Characterization of simultaneous multithreading (SMT) efficiency in POWER5

نویسندگان

  • Harry M. Mathis
  • Alex E. Mericas
  • John D. McCalpin
  • Richard J. Eickemeyer
  • Steven R. Kunkel
چکیده

Coarse-grained multithreading, the switching of threads to avoid idle processor time during long-latency events, has been available on IBM systems since 1998. Simultaneous multithreading (SMT), first available on the POWER5e processor, moves beyond simple thread switching to the maintenance of two thread streams that are issued as continuously as possible to ensure the maximum use of processor resources. Because SMT has the potential of increasing processor efficiency and correspondingly increasing the amount of work done for a given time span, the reader might suppose that SMT would exhibit a performance gain for all workloads. This is true for most workloads, but is not true in some exceptional cases. In SMT mode, the processor resources—register sets, caches, queues, translation buffers, and the system memory nest—must be shared by both threads, and conditions can occur that degrade or even obviate SMT performance improvement. The POWER4e and POWER5 processors have very powerful performance monitor (PM) toolsets that can help the user to determine what is occurring in workloads that may not be providing expected SMT gains. In this paper, the results of measured differences among workloads having large, medium, small, and even negative SMT performance gains are presented along with an approach to investigating workloads to determine the source of SMT performance gain limits.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study of the Influence of the POWER5 Dynamic Resource Balancing (DRB) on Optimal Hardware Thread Priorities

Simultaneous Multithreading, often abbreviated SMT, is a technique for improving the overall efficiency of superscalar processors with hardware multithreading. SMT permits a processor to concurrently execute multiple independent instruction streams every clock cycle, potentially improving processor throughput. However, this can introduce contention for shared resources amongst threads running c...

متن کامل

Operating system exploitation of the POWER5 system

The POWER5e system incorporates several features designed to improve performance by eliminating bottlenecks and accelerating common functions used in operating systems. This paper discusses how two of the supported operating systems for POWER5—AIXt and Linuxe—make use of these features to deliver improved system scalability and performance. In particular, the overheads for synchronizing transla...

متن کامل

Evaluating the Thermal Efficiency of SMT and CMP Architectures

Simultaneous multithreading (SMT) and chip multiprocessing (CMP) both allow a chip to achieve greater throughput, but their thermal properties are still poorly understood. This paper uses Turandot, PowerTimer, and HotSpot to evaluate the thermal efficiency for a Power4/Power5-like core. Our results show that although SMT and CMP exhibit similar peak operating temperatures, the mechanism by whic...

متن کامل

Enhancing the Performance of Multigrid Smoothers in Simultaneous Multithreading Architectures

We have addressed in this paper the implementation of redblack multigrid smoothers on high-end microprocessors. Most of the previous work about this topic has been focused on cache memory issues due to its tremendous impact on performance. In this paper, we have extended these studies taking Simultaneous Multithreading (SMT ) into account. With the introduction of SMT, new possibilities arise, ...

متن کامل

Integrating Multiple Forms of Multithreaded Execution on SMT Processors: A Quantitative Study with Scientific Workloads

Simultaneous multithreaded (SMT) processors have penetrated the mainstream computing market, since they offer a number of cost / performance advantages over conventional superscalar processors at a nominal additional cost. Simultaneous multithreading can be used in the execution engine of a single monolithic microprocessor, or be embedded and replicated in the execution cores of a chip multipro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IBM Journal of Research and Development

دوره 49  شماره 

صفحات  -

تاریخ انتشار 2005